3622: [C] Fix negative length validation in binary decoding by martin-augment · Pull Request #27 · martin-augment/avro

martin-augment · 2026-01-13T20:34:00Z

3622: To review by AI

The read_bytes() and read_string() functions in encoding_binary.c decode length values using zigzag encoding, which can produce negative numbers from malicious input. These negative values were passed directly to avro_malloc(), causing allocation failures or undefined behavior. This patch adds validation to reject negative length values with a clear error message before attempting memory allocation. Bug: Negative length values from varint decoding cause allocation-size-too-big when cast to size_t Impact: DoS via crafted binary input Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2026-01-13T20:34:47Z

Walkthrough

The changes add validation checks to prevent negative length values from being processed in the encoding binary module. Specifically, the read_bytes and read_string functions now validate the parsed length immediately after reading it. If a negative length is detected, an error status (Invalid bytes/string length) is set and EINVAL is returned, preventing subsequent buffer allocations and processing operations.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a739f8 and a7ff134.

📒 Files selected for processing (1)

lang/c/src/encoding_binary.c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Cursor Bugbot

🔇 Additional comments (2)

lang/c/src/encoding_binary.c (2)

130-133: LGTM! Correctly validates negative length before allocation.

The validation properly rejects negative lengths decoded from malicious varint input, preventing the undefined behavior that would occur when casting a negative int64_t to size_t for avro_malloc().

182-185: LGTM! Consistent validation with read_bytes.

The check correctly guards against negative string lengths before buffer allocation.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-13T20:35:11Z

Code Review - PR #27: [C] Fix negative length validation in binary decoding

Summary

This PR addresses a critical security vulnerability where negative length values from zigzag-decoded varints could be passed to avro_malloc(), causing allocation failures or undefined behavior. The fix adds validation to reject negative lengths before memory allocation.

✅ Strengths

Security Fix - Critical

Correctly addresses a DoS vulnerability where malicious input could crash the application
Prevents undefined behavior from casting negative int64_t to size_t in malloc calls
Clear and appropriate error messages using PRId64 format specifier

Code Quality

Consistent implementation across both read_bytes() and read_string()
Follows existing error handling patterns in the codebase
Uses appropriate error code (EINVAL) for invalid input
Proper formatting and style matching the surrounding code

⚠️ Issues & Recommendations

1. Critical: Missing validation in `skip_bytes()` (encoding_binary.c:144-152)

The skip_bytes() function has the same vulnerability but was not patched:

static int skip_bytes(avro_reader_t reader)
{
    int64_t len = 0;
    int rval;
    check_prefix(rval, read_long(reader, &len),
                 "Cannot read bytes length: ");
    AVRO_SKIP(reader, len);  // ⚠️ No validation of negative len!
    return 0;
}

Impact: An attacker can still trigger undefined behavior via the skip path.

Recommendation: Add the same negative length check to skip_bytes():

if (len < 0) {
    avro_set_error("Invalid bytes length: %" PRId64, len);
    return EINVAL;
}

2. Missing: Test Coverage

No tests were added to verify the fix. This security vulnerability should have regression tests.

Recommendation: Add tests that:

Verify negative length values are rejected with proper error messages
Test boundary conditions (0, -1, INT64_MIN)
Ensure the fix works for both read_bytes() and read_string()
Confirm no memory leaks on error paths

Example test structure:

// Test negative length in read_bytes
avro_reader_t reader = create_reader_with_negative_varint();
char *bytes = NULL;
int64_t len = 0;
int result = read_bytes(reader, &bytes, &len);
assert(result == EINVAL);
assert(bytes == NULL);

3. Consideration: Integer Overflow Protection

While the negative check is good, extremely large positive values could still cause issues:

Casting very large int64_t to size_t on 32-bit systems
Allocation of unreasonably large buffers (e.g., INT64_MAX bytes)

Recommendation (Future Enhancement): Consider adding an upper bound check:

#define MAX_REASONABLE_LENGTH (1024 * 1024 * 1024)  // 1GB or appropriate limit

if (*len < 0 || *len > MAX_REASONABLE_LENGTH) {
    avro_set_error("Invalid bytes length: %" PRId64, *len);
    return EINVAL;
}

4. Minor: Code Location

The validation is placed after check_prefix but before allocation. This is correct, but consider documenting why this ordering is important (to avoid duplicate error messages).

🔒 Security Assessment

Severity: High (DoS vulnerability)
Attack Vector: Crafted Avro binary input with negative varint lengths
Mitigation: Properly validates input before allocation
Residual Risk: skip_bytes() still vulnerable

📊 Performance Considerations

The added validation has negligible performance impact:

Single integer comparison per read operation
No additional memory allocations
Minimal CPU overhead

✅ Best Practices Compliance

✅ Error handling follows existing patterns (avro_set_error, return EINVAL)
✅ Uses portable format specifiers (PRId64)
✅ Consistent style with surrounding code
✅ Clear commit message with security context
❌ Missing test coverage
❌ Incomplete fix (skip_bytes not patched)

🎯 Recommendation

Do not merge yet. Please address:

Critical: Add negative length validation to skip_bytes() (same pattern as this PR)
Important: Add regression tests for negative length handling
Consider: Add upper bound validation for defense in depth

Once these issues are addressed, this will be a solid security fix ready for merge.

🤖 Generated with Claude Code

augmentcode · 2026-01-13T20:36:02Z

🤖 Augment PR Summary

Summary: Adds explicit validation to reject negative lengths while decoding Avro binary bytes and string values.

Technical Notes: On malformed negative lengths, decoding now sets an error message and fails fast with EINVAL to avoid invalid allocations.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-13T20:36:03Z

 	int rval;
 	check_prefix(rval, read_long(reader, len),
 		     "Cannot read bytes length: ");
+	if (*len < 0) {


Good to see negative-length validation added for read_bytes/read_string. Note that skip_bytes (and thus skip_string) still accepts a decoded negative length and passes it to AVRO_SKIP, which could behave badly on malformed input.

_{🤖 Was this useful? React with 👍 or 👎}

value:useful; category:security; feedback:The Augment AI reviewer is correct! The same kind of problem as the fixed ones could be seen in the skip_bytes() method. It should be fixed the same way as the others. Prevents a security attack by crafting an Avro file with negative length for a field with a Bytes schema.

augmentcode · 2026-01-13T20:36:03Z

+		avro_set_error("Invalid bytes length: %" PRId64, *len);
+		return EINVAL;
+	}
 	*bytes = (char *) avro_malloc(*len + 1);


avro_malloc is a thin wrapper over realloc and doesn’t guard size overflows; if *len is extremely large, *len + 1 can overflow (UB) and/or truncate when converted to size_t. Consider adding an upper-bound check (also applies to read_string’s str_len + 1) before computing the allocation size.

_{🤖 Was this useful? React with 👍 or 👎}

value:good-to-have; category:security; feedback:The Augment AI reviewer is correct! A specially crafted Avro file with a really big length value will lead to trying to allocate a lot of memory and possible crash due to out-of-memory error.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lang/c/src/encoding_binary.c (1)
144-152: Missing negative length validation in skip_bytes.

The same malicious varint input that could produce negative lengths in read_bytes can also affect skip_bytes. Here, len is passed directly to AVRO_SKIP without validation, which could cause undefined behavior depending on how the macro handles negative values.

For consistency and completeness of the security fix, consider adding the same validation:
Proposed fix
 static int skip_bytes(avro_reader_t reader)
 {
 	int64_t len = 0;
 	int rval;
 	check_prefix(rval, read_long(reader, &len),
 		     "Cannot read bytes length: ");
+	if (len < 0) {
+		avro_set_error("Invalid bytes length: %" PRId64, len);
+		return EINVAL;
+	}
 	AVRO_SKIP(reader, len);
 	return 0;
 }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a739f8 and a7ff134.

📒 Files selected for processing (1)

lang/c/src/encoding_binary.c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Cursor Bugbot

🔇 Additional comments (2)

lang/c/src/encoding_binary.c (2)

130-133: LGTM! Correctly validates negative length before allocation.

The validation properly rejects negative lengths decoded from malicious varint input, preventing the undefined behavior that would occur when casting a negative int64_t to size_t for avro_malloc().

182-185: LGTM! Consistent validation with read_bytes.

The check correctly guards against negative string lengths before buffer allocation.

martin-augment · 2026-01-13T20:51:50Z

1. Critical: Missing validation in skip_bytes() (encoding_binary.c:144-152)

The skip_bytes() function has the same vulnerability but was not patched:
static int skip_bytes(avro_reader_t reader)
{
    int64_t len = 0;
    int rval;
    check_prefix(rval, read_long(reader, &len),
                 "Cannot read bytes length: ");
    AVRO_SKIP(reader, len);  // ⚠️ No validation of negative len!
    return 0;
}
Impact: An attacker can still trigger undefined behavior via the skip path.

Recommendation: Add the same negative length check to skip_bytes():
if (len < 0) {
    avro_set_error("Invalid bytes length: %" PRId64, len);
    return EINVAL;
}

value:useful; category:security; feedback:The Claude AI reviewer is correct! The same kind of problem as the fixed ones could be seen in the skip_bytes() method. It should be fixed the same way as the others. Prevents a security attack by crafting an Avro file with negative length for a field with a Bytes schema.

martin-augment · 2026-01-13T20:53:42Z

3. Consideration: Integer Overflow Protection

While the negative check is good, extremely large positive values could still cause issues:

Casting very large int64_t to size_t on 32-bit systems

Allocation of unreasonably large buffers (e.g., INT64_MAX bytes)

Recommendation (Future Enhancement): Consider adding an upper bound check:
#define MAX_REASONABLE_LENGTH (1024 * 1024 * 1024)  // 1GB or appropriate limit

if (*len < 0 || *len > MAX_REASONABLE_LENGTH) {
    avro_set_error("Invalid bytes length: %" PRId64, *len);
    return EINVAL;
}

value:good-to-have; category:security; feedback:The Claude AI reviewer is correct! A specially crafted Avro file with a really big length value will lead to trying to allocate a lot of memory and possible crash due to out-of-memory error.

martin-augment · 2026-01-13T20:55:27Z

144-152: Missing negative length validation in skip_bytes.

The same malicious varint input that could produce negative lengths in read_bytes can also affect skip_bytes. Here, len is passed directly to AVRO_SKIP without validation, which could cause undefined behavior depending on how the macro handles negative values.

value:useful; category:security; feedback:The CodeRabbit AI reviewer is correct! The same kind of problem as the fixed ones could be seen in the skip_bytes() method. It should be fixed the same way as the others. Prevents a security attack by crafting an Avro file with negative length for a field with a Bytes schema.

github-actions bot added the C label Jan 13, 2026

augmentcode bot reviewed Jan 13, 2026

View reviewed changes

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

Conversation

martin-augment commented Jan 13, 2026

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Jan 13, 2026

Code Review - PR #27: [C] Fix negative length validation in binary decoding

Summary

✅ Strengths

⚠️ Issues & Recommendations

1. Critical: Missing validation in skip_bytes() (encoding_binary.c:144-152)

2. Missing: Test Coverage

3. Consideration: Integer Overflow Protection

4. Minor: Code Location

🔒 Security Assessment

📊 Performance Considerations

✅ Best Practices Compliance

🎯 Recommendation

Uh oh!

augmentcode bot commented Jan 13, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Jan 13, 2026

1. Critical: Missing validation in skip_bytes() (encoding_binary.c:144-152)

Uh oh!

martin-augment commented Jan 13, 2026

3. Consideration: Integer Overflow Protection

Uh oh!

martin-augment commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

1. Critical: Missing validation in `skip_bytes()` (encoding_binary.c:144-152)

1. Critical: Missing validation in `skip_bytes()` (encoding_binary.c:144-152)